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1  INTRODUCTION 


The  design  of  signals  for  communication  and  radar  systems  is  presently  more 
of  an  art  than  a  science  Many  excellent  signals  have  evolved  and  valuable  theories 
are  available  but  these  are  special  cases  which  do  not  lead  to  satisfactory  generalizations. 
"Therefore  to  meet  an  obvious  need,  we  shall  introduce  a  methodology  for  signal  design 
which  encompasses  a  very  broad  class  of  problems,  enables  the  incorporation  of  signal 
constraints  and  design  criteria  in  a  natural  manner,  and  provides  a  formulation  well 
adapted  to  modern  computer  techniques. 

The  signal  design  theory  presently  employed  appears  to  be  based  primarily  on 
a  priori  parameterizations  of  the  signal  Optimization  is  then  performed  over  the 
values  of  the  parameters,  (see  for  example,  Refs.  1, 2,  3)  More  general  results  rely 
on  optimization  by  inspection  of  special  cases  (see  for  example,  Ref.  4)  Signal 
constraints  are  rarely  an  integral  part  of  the  development  and  the  criteria  of  optimality 
is  often  vague  The  development  of  a  general  theory  of  optimum  signal  design  has  been 
handicapped  by  the  analysis  technique  employed  Frequency  domain  concepts  are 
impractical  for  most  nonstationary  problems.  The  Karhunen-Loeve  expansion,  (see 
Ref  5)  is  a  fine  theoretical  tool  but  often  obscures  the  real  issues  by  its  generality 
In  addition,  integral  equations  are  often  obtained  and  they  are  usually  not  well  suited  to 
computer  investigation  Several  years  ago,  control  theory  was  in  this  same  situation 
as  it  relied  on  frequency  domain  techniques,  optimization  of  parameterized  systems 
and  a  wide  variety  of  special  purpose  approaches  However,  a  return  to  the  time 
domain  and  the  state  variable  concepts  of  classical  physics,  and  an  extension  of  the 
calculus  of  variations  called  Pontryagin's  Maximum  Principle,  provided  a  basis  for  a 
general  approach  which  is  presently  called  optimal  control  theory  Non-stationary 
stochastic  processes  became  a  minor  extension  when  interest  was  confined  to  finite 
order  Markov  processes,  a  model  of  sufficient  generality  for  most  problems.  The 
resulting  differential  and  difference  equations  are  well  adapted  to  computer  investigation 

This  report  merely  marries  the  concepts  of  optimal  control  theory  to  the  design 
of  signals  for  communication  and  radar  systems.  The  only  addition  required  is  the 
incorporation  of  statistical  decision  theory  in  an  appropriate  form,  We  do  not  attempt 
full  consummation  of  the  wedding,  but  rather,  use  examples  to  indicate  possible  progeny. 
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The  present  theory  is  an  outgrowth  of  studies  on  the  design  of  optimum  amplitude 

and  frequency  modulations  for  a  radar.  The  amplitude  modulation  work  was  done  with 

Donald  Gray.  *  These  results  are  presently  being  documented  and  will  be  available 

shortly  as  explicit  examples  of  how  the  theory  can  be  employed  in  realistic  problems, 
t 

Dr.  Robert  Price  introduced  the  author  to  communication  applications,  motivated  the 

present  writing  and  provided  many  valuable  comments. 

Scalars  are  denoted  by  lower  case  letters,  with  or  without  subscripts.  Vectors 

are  underlined  lower  case  letters,  with  or  without  subscripts.  Matrices  are  denoted  by 

capital  letters.  The  exceptions  to  these  rules  are  so  indicated  in  the  text.  All  vectors 

are  column  vectors.  Transpose  is  indicated  by  a  prime.  "  |  |  "  denotes  the  determinant 

and  "tr"  the  trace  of  a  matrix  Henceforth,  the  term  waveform,  applies  to  the  signal 

to  be  designed  and  x  always  refers  to  this  waveform.  The  quantities  observed  by  the 

receiver  are  always  denoted  by  z.  Covariance  matrices  are  denoted  by  2  and  variances 
2 

by  <J  with  appropriate  subscripts.  We  use  the  same  symbol  for  both  a  random  variable 
and  a  sample  of  the  random  variable  and  rely  on  the  text  to  furnish  the  necessary 
distinction.  Discrete  white  noise  is  a  sequence  of  zero  mean,  stationary  independent 
Gaussian  random  variables.  Continuous  time  white  noise  stochastic  processes  are 
zero  mean,  stationary  Gaussian  processes  with  uniform  spectrum  and  are  employed  in 
a  nonrigorous  manner. 


*  MIT,  Lincoln  Laboratory  (Summer  Staff), 
t  MIT,  Lincoln  Laboratory. 
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2.  TWO  EXAMPLES 


A  simple  communication  problem  and  a  simple  radar  problem  are  stated. 

These  two  examples  will  be  carried  through  the  general  development  to  provide  a 
vehicle  for  illustrating  the  methodology  of  waveform  design  and  indicating  its 
generality. 

Consider  the  following  communication  problem.  A  receiver  knows  that  one  of 
M  waveforms 

xk(t),  0  ^t  ssT,  k  =  1 . M  (2.1) 

was  transmitted.  The  receiver  observes  the  transmitted  waveform  corrupted  by 
additive  noise  and  must  decide  which  of  the  M  waveforms  was  transmitted.  Let  z(t) 
be  the  observed  signal.  Then 

M 

z(t)  =  \  ak\^  +  n(t)  (2.2) 

k=r 


where  r](t)  is  the  observation  noise  and 


"k*°  k«“o 


“k  ■  1  k  ■  k0 


where  x  (t)  is  the  waveform  actually  transmitted.  Assume  rj(t)  is  white  noise.  The 

ko 

choice  of  possible  x  (t)  is  restricted  by  constraints  on  total  energy  and  bandwidth.  It 

K 

is  desired  to  choose  the  set  of  functions,  xk(t),  k  =  1,  . .  .  ,  M  which  provide  the  best 
probability  of  making  the  correct  decision  as  to  which  0^=1. 

Consider  the  following  radar  problem.  A  target  is  moving  with  respect  to  a 
radar  at  a  range,  a(t),  given  by 


a(t)  =  +  oy 


(2.3) 
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where  ot  ^  and  are  the  range  and  range  rate  at  time  zero.  *  Assume  the  transmitted 
radar  signal  is  a  sequence  of  many  short  pulses  of  amplitude  x(n)  where  n  indexes  time 
and  that  the  radar  receiver  estimates  only  the  range,  >i(n),  at  time  n  from  each  pulse. 
Let  t?( n)  denote  the  error  made  in  this  estimate  of  range.  A  model  for  the  quantity, 
z(n),  observed  from  each  pulse  is  then 


z(n)  =  n,(  n)  +  n(n) 

where 

E(r?2(n))  =  <r2/x2(n) 


where  a  is  determined  by  the  background  noise  and  the  bandwidth  of  the  transmitted 
pulse.  Since  x(n)  z(n)  contains  the  same  information  as  z(n),  the  receiver’s 
observations  can  be  modeled  as 


where 


z(n)  =  x(n)n,(n)  +  7}(n) 


E072(n))  =  a2 


Further,  assume  r]( n)  is  discrete  white  noise.  For  convenience,  we  will  use  a  continuous 
time  approximation  to  this  model;  that  is, 


z(t)  =  x(tH(0  +  r?(t)  (2.4) 

2  t 

where  7](t)  is  white  noise  with  power,  cr  .  The  choice  of  possible  x(t)  is  restricted 
by  constraints  on  its  peak  amplitude  and  total  energy.  It  is  desired  to  determine  the 
x(t)  which  provides  the  best  estimate  of  and  a ^ 

M 

*  A  n,(t)  of  the  form  ^  is  a  trivial  extension. 

k=l 

t  Under  appropriate  assumptions  on  bandwidth  and  ability  to  resolve  phase,  the  model 
of  Eq.  (2.  4)  is  equivalent  to  that  W'here  x(t)  is  the  amplitude  modulation  of  a  high 
frequency  carrier  and  z(t)  is  the(complex)  r-f  signal.  However,  the  proof  of  this 
equivalence  is  lengthy  and  is  omitted. 
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3.  MEAN  INFORMATION  CONTENT  OF  RECEIVED  SIGNAL 


Communication  and  radar  systems  receive  transmitted  signals  v/hich  have 
passed  through  a  channel  which  may  change  the  signal’s  form  and/or  introduce  additive 
observation  noise.  On  the  basis  of  the  receiver’s  observations,  a  decision  of  some 
nature  is  desired.  The  prime  difference  between  communication  and  radar  systems  is 
the  nature  of  this  decision  Communication  systems  must  decide  which  signal  was 
transmitted  Radar  systems  must  decide  the  nature  of  the  channel;  for  example,  the 
distance  between  the  radar  and  the  target  which  reflects  the  signal  In  both  cases 
classical  statistical  decision  theory  provides  a  basis  for  making  and  evaluating  these 
decisions.  The  optimum  transmitted  signal  is  the  signal  which  results  in  the  "fewest" 
decision  errors. 

The  probability  distribution  of  the  received  signal  obviously  specifies  the 
decision  errors.  However,  for  our  development  we  employ  only  a  particular  function 
of  the  probability  distributions  rather  that  the  distributions  themselves.  This 
function  is  the  mean  information  contained  in  the  observed  signal.  Our  discussions 
are  based  on  Ref.  6  although  we  use  neither  the  same  generality  nor  degree  of  rigor. 

Consider  the  observable  random  variable,  z.  Assume  we  know  that  z  has 
one  of  two  probability  density  functions,  f  ^(z)  or  f ^ (z)  On  the  basis  of  observing  a 

particular  realization  of  z,  we  want  to  decide  between  the  two  hypotheses,  IT,  j  =  1,2 
where  IT  is  the  hypothesis  that  L(z)  is  the  probability  density  of  z.  Symbolically, 

Hi  •  fi<z> 

H2  :  f2(z)  . 

Let  1(1: 2/z)  denote  the  mean  information  contained  in  an  observation,  z,  for 
discrimination  in  favor  of  H  against  H  This  mean  information  is  defined  as  the 
expected  value,  assuming  is  true,  of  the  logarithm  of  the  likelihood  ratio. 

r  fi(z) 

1(1: 2/z)  =  \  fj(z)  log  — dz  (3.1) 
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A  few  comments  on  this  definition  of  mean  information  are  provided  in  the  appendix. 
For  simplicity  we  call,  1(1. 2/z)  the  information  rather  than  mean  information. 

If  we  make  vector  observations,  7/  =  [  z^, . . . ,  , 


1(1: 2/z) 

<3-2> 

or  more  concisely, 

r  fi(-} 

1(1  2/z)  =  \  f  :(z)  log  dz  (3.  3) 

Continuous  time  observations,  z(t),  0  <  t  ^  T,  are  a  similar  extension.  Since 
discrete  and  continuous  observations  involve  basically  the  same  concepts,  we  abuse 
notation  and  let  the  symbol  1(1  2/z )  also  apply  to  the  continuous  time  case.  Information 
is  additive  in  the  following  sense.  If  zj»  •  •  *  >  are  independent  samples  from  the 
same  distribution,  then 

N 

1(1  2/z)  =  \  I(l:2/zn)  .  (3.4) 

n=l 

Now  consider  the  case  where  the  two  distributions,  f^(z)  and  f^(z)  differ  only 
with  respect  to  the  values  of  a  vector  parameter,  6  .  That  is, 

H! 

H2  :  f(z/02). 

If  £.  =  and  0_2  =  £  +  A£,  where  A 6_  is  small,  then  under  sufficient  regularity 
conditions, 

21(1  2/z)  =  A0'IA0  (3.5) 
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where  prime  denotes  transpose  and  I  is  the  positive  definite,  Fisher  information  matrix 
whose  elements,  t  ,  are  given  by 


t 


jk 


^  f(z ,6) 


3f(z,0) 

90 

j 


9f(z,£) 
9 


dz 


(3.6) 


See  Sec.  2,  6  of  Ref.  6  for  derivations  of  Eq.  (3.  5)  and  Eq.  (3.  6). 
If  z(t)  is  of  the  form 


z(t)  =  v(t,  6  )  +  £(t)  0  <t  <  T 


(3.7) 


where  £(t)  is  white  noise  with  power  a  ,  then  it  is  not  difficult  to  show 


i=  4  *(t)^'(t)dt 


(3.8) 


where  the  elements,  <*>  (t),  of  £(t)  are 


9v(t,  0 ) 


Vr)  90, 


(3.9) 


If  J(t)  denotes  the  Fisher  information  matrix  for  z(t),  0  <  t  <t,  then 


dl(t) 

dt 


~Y  £(*)£' (0 

a 


(3.10) 


where  the  left  hand  side  denotes  the  matrix  of  the  time  derivatives  of  the  elements  of 
I(t).  If  2^  denotes  the  covariance  matrix  of  any  unbiased  estimate  of  then 


-  -T 


-1 


(3.11) 
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where  the  inequality  means  the  difference  is  a  positive  definite  matrix.  *  If  Eq.  (3.  7)  is 
linear;  that  is, 


V(t,0)  ek0k(t)  (3.12) 

then  equality  in  Eq.  (3, 11)  is  obtained  when  a  minimum  variance,  unbiased  estimate 
is  used. 

The  model,  Eq.  (3.7)  and  its  implications  Eqs.  (3.  10)  and  (3.  11),  are  well- 
known.  However,  the  following  extension  enables  the  modeling  of  a  far  wider  range  of 
problems.  For  simplicity,  we  consider  only  linear  systems.  Assume  vector 
observations,  £(t)  of  the  form 


Z(t)  =  C(t)I(t)  +  l2(t) 

(3.13) 

I(t)  «  A(t)y(t)  +  EKtH^t) 

(3.14) 

where  ^(t)  is  a  vector,  A(t),B(t),  and  C(t)  are  known,  possibly  time-varying  matrices, 
and  £  (t)  and  |  (t)  are  vector  white  noise  processes,  independent  of  each  other  with 


Elijtt)  ^  (3.15) 

Eli.2(t)  J_2(t)  1  =  ^  <3*16) 

Equations  (3.  13)  and  (3,  14)  are  the  state  variable  representation  for  finite  dimensional, 
time-varying,  linear  processes.  ^  y(t)  is  a  Markov  process.  By  appropriate  choice  of 
driving  noise,  J_^(t),  and  system  dynamics,  A(t),  B(t),  C(t),  a  wide  variety  of 


*  This  inequality  is  often  called  either  the  Cramer-Rao  or  Information  Inequality  and  is 
used  as  a  basis  for  defining  the  efficiencies  of  estimates,  (see  Sec.  3.  5  of  Ref.  6  or 
Sec.  12.6  of  Ref.  7).  Reference  6  also  extends  the  inequality  to  the  case  of  biased 
estimates. 

t  Reference  8  discusses  the  state  variable  concept  in  great  detail. 
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deterministic  and  stochastic  signals  and  correlated  observation  noise  can  be  modeled. 
Equation  (3.  14)  is  a  linear  system  driven  by  white  noise  and  is  an  extension  of  the 
"pre-whitening"  filter  concept  discussed  in  Ref,  9.  If  I(t)  denotes  the  Fisher 
information  matrix  for  the  vector  observations  z(t),  0  ^  T  ^  t  of  Eq.  (3.  13),  it  can 
be  shown  that 


dl(t) 

dt 


-I(t)A(t)  -  A'(t)I(t)  -7(t)B(t)Z  B'(t)7(t) 

-1 


+  c,(t)  Z  lC(t) 

—2 


(3.17) 


The  7(t)  of  Eq  (3.  17)  is  the  information  available  on  y(t)  at  time  t.  This  is  related 
to  Eqs.  (3,  7)  and  (3,  10)  by  associating  6_  with  ^(t)  References  10  and  11  made  the 
original  investigations  of  equations  such  as  Eq.  (3.  17).  Reference  12  uses  a  different 
method  of  derivation  *  The  initial  conditions  for  Eq.  (3.  17);  i.e. ,  7(0)  ,  depend  on  the 
information  available  before  the  receipt  of  the  signal  Equation  (3.  17)  is  a  matrix 
Ricatti  equation.  The  first  two  terms  show  how  the  information  available  on  ^(t) 
varies  with  time  (with  ^(t)  )  when  no  driving  noise  is  present  (  =0)  and  when  no 


-1 


-2-1 


observations  are  made(2  =  0);  that  is,  how  7(0)  evolves.  The  third  term  indicates 

—2 

the  rate  of  decrease  in  information  caused  by  the  driving  noise  while  the  fourth  term 


gives  the  rate  of  increase  of  information  from  the  observations. 


Now  consider  the  basic  problem  of  parameter  estimation;  that  is,  deciding  on 
an  explicit  estimate  for  the  values  of  some  parameter  set  Let  0  denote  the  estimate 
of  6^  made  from  the  observations  z  For  a  high  signal-to-noise  ratio, ^  A 6  =  d  -0_, 
is  small  and  the  Fisher  information  matrix,  7,  measures  the  information  contained  in 
z  for  discriminating  between  9_  and  0 . 


*  These  references  actually  derive  the  equation  for  the  covariance  matrix  Zy(t)  of  the 
minimum  variance  (unbiased)  estimate  of  y(t),  However,  since  the  model  is  linear, 

10 t)  =  Sy^t). 

t  The  necessary  magnitude  of  the  signal-to-noise  ratio  is  intimately  related  to  the 
dynamics  of  the  signal.  The  ideas  of  Ref*  13  are  valuable  in  considering  such  problems. 
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The  radar  problem  of  Sec.  2  is  one  of  parameter  estimation  and  the  Fisher 
information  is  thus  appropriate.  *  For  the  example  continued  through  the  following 
sections,  the  observation  noise,  7](t),  of  Eq.  (2.  4)  is  assumed  white.  Thus,  from 
Eq.  (3.  10)  (or  a  very  special  case  of  Eq.  (3.  17), 


dJ(t) 

dt 


(3.18) 


However,  to  illustrate  the  wide  range  of  models  possible  using  Eqs.  (3.  13)  and  (3. 14), 
temporarily  assume  that 


V(t)  =  T7x(t)  +  r?2(t)  +  tj3 


(3.19) 


where  77 ^ (t)  is  a  second-order,  nonstationary  Markov  process  of  the  form  , 


d2Tyt) 


dt 


2"  +  a j (t)  —  T)j(t)  +  =  white  noise 


*  Since  the  model  is  a  linear,  Gaussian  process,  an  assumption  of  high  signal-to-noise 
is  not  needed  as  I  gives  the  variance  with  which  the  parameters  can  be  estimated. 
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T]^(t)  is  white  noise  and  17  is  an  unknown  constant  representing  an  unknown  mean  value. 
Define 


where  £^(t)  is  white  noise  and  where  we  associate  77 ^ (t)  with  y^(t),  rj^t)  with  1^(0 
(white  noise)  and  77^  with  y^(t).  We  can  combine  this  observation  noise  with  the  radar 
example  by  letting  the  state  variables  y^(t) ,  . . .  ,  y^t)  correspond  to  the  constants, 
ot  1  and  cv  of  Eq.  (2.4).  Thus 


and  in  terms  of  Eq.  (3.  13)  and  (3.  14),  the  radar  example  with  the  noise  of  Eq.  (3.  19)  is 
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However,  instead  of  representing  the  range  as  in  Eq.  (2.3),  we  could  define, 


y4(t)  =  Mt) 

y5(t)  =!  a(t) 

obtain  the  differential  equation, 


and  then  consider  the  problem  of  estimating  the  range,  and  range  rate,  y^(t), 

at  time  t  rather  than  and  a the  range  and  range  rate  at  time  zero.  With  this 
approach,  B(t)  is  still  unchanged  while 


Uw  J 
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With  this  second  formulation,  extension  to  targets  with  stochastic  motion  can  be 
handled  by  changing  B(t)  and  £  (t).  Reference  14  considers  the  use  and  implications 
of  such  models  in  filter  design. 

The  communication  type  problem  requires  a  decision  which  differs  from 
parameter  estimation  as  we  want  to  decide  which,  if  any,  signal  was  transmitted.  This 
is  not  a  choice  between  a  parameter,  6_,  and  a  small  perturbation,  £+  Thus 

the  Fisher  information  matrix  does  not  necessarily  apply  and  we  must  return  to  the  basic 
definition  of  information.  However,  a  result  analogous  to  the  Fisher  information 
matrix  is  obtained. 

For  the  communication  problem  of  Sec.  2  the  observation  noise,  r]( t),  of 
Eq.  (2,  2)  is  white.  We  begin  with  a  discrete  time  version  of  Eq.  (2.  2) 


z(n)  =  (pin )  +  r](n) 


n  =  I, .  . .  ,N  (3.20) 


where 


M 

<p(n)  =^<*kxk(n) 
k=l 


(3.21) 


and  7](n)  is  discrete  white  noise  with  variance,  a 
z(n) .  Then 


f(z)  = 


1 


(27rcr  ) 


Let  z,  denote  the  vector  of  the 


N 

I 


(z(n)  -  (p(n)Y 


We  want  to  decide  which,  if  any,  of  the  a  is  one.  M  hypothesis  tests  of  the  form 

H.  :  a  =  0  all  k 
1  k 

lo  k 


H 


2 


a 

k 


1  k  =  j 


(3.22) 


for  j  =  1,  . . .  ,  M  would  solve  the  problem  (provided  was  accepted  only  once).  Let 
I  (l:2/z)  denote  the  information  for  the  jth  of  these  tests.  Using  Eq.  (3.  2),  jt  is 
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not  difficult  to  show  that 


N 


21  (1:2/ z) 


"(n) 


(3.23) 


n=l 


Thus  for  this  case,  I^(l:2/£)  is  proportional  to  the  waveform’s  energy.  Equation  (3.23) 

is  the  information  which  controls  type  1  errors;  that  is,  deciding  that  =  0  when  in 

fact,  x.(t)  was  transmitted.  The  type  2  error  of  deciding  & .  =  1  when  in  fact  x  (t)  was 
J  J  * 

transmitted  is  also  important  and  is  controlled  by  the  information  with  respect  to 

hypotheses  of  the  form 


H, 


Hr 


V 


a 


=  0 

i  t) 

=  1 

i  =  j 

=  0 

i  5^k 

=  1 

i  =  k 

(3.24) 


for  j,k  =  1,  . . . ,  M,  j  4  k.  Let  I  (1  2/z)  denote  the  information  for  one  of  these  tests. 

Jk 

Then 


N 


2Ijk(1:2/z)  =  -L^,xj(„)-xk(„)): 


(3.25) 


n=l 


Thus  for  this  case,  I  (l:2/z)  is  proportional  to  the  "distance”  between  the  waveforms. 
J* 

Now  define  the  M  x  M  matrix,  I  with  elements  i  ,  bv 

jk 


N 


1  ~~~T  £ 


(n) 


(3.26) 


n=l 


where  the  x^(n)  are  the  elements  of  x(n).  By  inspection 
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21  <l:2/z) 
21,  (l:2/z) 


1..-2 1 

JJ 


Thus  I  contains  all  the  terms  that  control  the  probabilities  of  the  type  1  and  2  errors. 
Extending  Eq  (3.26)  to  the  continuous  time  case  gives 


T 

I  =  -j-  J  x(t)x'(t)dt  (3.27) 

0 

which  has  exactly  the  same  form  as  Eq  (3.  10)  Thus  J  of  Eq  (3*27)  is  the  same  as 
the  Fisher  information  matrix  for  estimating  the  of  Eq.  (2  2).  A  finite/  (0)  implies 
an  a  priori  Gaussian  distribution  on  the  transmitted  signals  or  the  incorporation  of  the 
first  two  moments  of  such  an  a  priori  distribution 

Since  our  communication  example  involves  a  linear  Gaussian  process,  the  above 
result  is  almost  a  foregone  conclusion  We  outlined  the  basic  arguments  as  they  also 
apply  in  the  general  case  To  decide  which  of  M  signals  was  transmitted,  we  evaluate 
the  information  on  M  tests  such  as  Eq  (3  22)  and  —  -y — —  tests  such  as  Eq.  (3.24). 
All  these  quantities  are  then  summarized  into  a  single  matrix  equation,  I  ,  of  which 
Eq  (3  27)  is  an  example.  In  general,  /  ,  need  not  be  the  Fisher  information  matrix. 
However,  it  is  easy  to  see  that  using  Markov  rather  than  white  observation  noise  in  the 

A 

communication  example  does  not  affect  the  relation  between  /  and  the  Fisher  matrix 
The  incorporation  of  observation  noise  such  as  Eq.  (3  19)  into  the  communication 
problem  is  therefore  a  straightforward  extension  of  the  radar  example. 

In  future  discussions  we  use  the  symbol/  for  either  the  Fisher  information 
matrix  or  /  and  simply  call  /  ,  the  information  matrix 

The  main  theme  of  this  section  is  the  illustration  of  how  the  information  content 
of  an  observed  signal  can  be  represented  in  state  variable  notation  For  our  examples, 
the  elements  of  the  information  matrix  /( t)  are  the  state  variables  and  they  are 
defined  by  differential  (or  difference)  equations  such  as  Eq.  (3*  10)  or  (3.  17).  We  have 
employed  the  information  measure  of  Eq*  (3.  1),  because  it  seems  to  be  the  most  useful 
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concept  now  available.  However,  this  definition  is  not  basic  to  our  general  theory 
of  waveform  design.  The  crux  of  the  problem  is  the  representation,  in  a  state  variable 
formulation,  of  the  information  content  of  the  received  signal.  Any  appropriate 
definition  of  information  can  be  used. 

The  radar  and  communication  examples  of  Sec.  2  are  developed  still  further 
in  the  following  sections.  However,  even  though  these  two  examples  do  illustrate  many 
facets  of  the  theory,  they  are  both  limited  to  the  case  of  observing  a  deterministic 
signal  in  the  presence  of  additive  Gaussian  noise.  To  show  that  this  is  not  an  inherent 
assumption,  the  information  content  of  a  signal  is  now  evaluated  for  a  simplified  version 
of  a  laser  radar.*  This  example,  however,  is  not  continued  thru  the  rest  of  the  report 
and  can  be  by-passed  if  desired. 

Consider  the  following  variation  of  the  laser  radar  model  discussed  in  Ref.  15. 

The  radar  transmitter  is  a  photon  source  whose  average  power  output  is  modulated  by 

the  waveform  x(t).  The  receiver  counts  the  number  of  photons  which  arrive.  Let 

th 

z(nAt)  be  the  number  of  photon  arrivals  during  the  n  time  interval  of  length  At. 
z(nAt)  is  assumed  to  be  a  sample  of  a  Poisson  random  variable,  where 


f(z)  = 


U(nAt))VMnAt> 

z ! 


(3.28) 


where  X(nAt)  is  the  average  rate  of  photon  arrivals  during  the  time  interval  At.  The 
model  for  X(nAt)  is 


X(nAt)  =  X^  +  0  x(n-n,  (n))  (3.29) 

where  0  is  the  magnitude  reflected  signal,  X^  is  the  mean  rate  of  the  background 
radiation  (noise)  and  where  A(n)  is  the  time  delay  between  transmission  and  receipt  of 
the  signal.  As  in  the  radar  example  of  Sec.  2,  we  assume 


A(n)  =  a  +  a  n 


(3.30) 


*  We  have  also  investigated  stochastic  signals  such  as  arise  from  multipath,  (see, 
Ref.  16  or  Chapt.  11  of  Ref.  17).  However,  this  topic  is  deferred  to  a  later  report. 
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where  a  and  ot  are  related  to  the  target's  range  and  range  rate.  We  have  also  made 
l  z 

the  assumption  that  At  is  small  enough  the  x(t-A^t))  is  essentially  constant  over  a  time 
span  At  We  want  to  estimate  o>  ^  and  ot  ^  (assuming  Aq  and  fi  are  known)  and  therefore 
use  the  Fisher  information  matrix  as  the  measure  of  information  content.  From 
Eq  (3.28) 


9f(z)  _  M  ( z(nAt)  \  gMnAt) 

dot.  1{Z)  A(nAt)  da. 

J  v  J 

After  manipulation,  Eq  (3.6)  becomes 


t  (nAt)  -  A(nAt) 
Jk 


9A(nAt) 

J 


9A(nAt) 

9a, 

k 


(3.31) 


(3.32) 


where  the  t  (nAt)  are  the  elements  of  I(nAt),  the  information  matrix  for  z(nAt).  Now 
Jk 

assume  a  and  a  are  small.  Then 


9A(nAt)  _  dx(n) 
da  ~  ~ P  dt 


and 


9A(nAt)  „  dx(n) 
~d^-=pn— 


where  dx(n)/dt  is  dx(t)/dt  evaluated  at  t  =  n.  Thus 


I(nAt)  =  <A+0x<n))£^2l 
o  dt 


1  n 


n  n 


(3.33) 


Equation  (3.33)  is  the  information  obtained  during  one  time  interval.  If  we  assume 
independence  between  time  intervals,  the  total  information  is  the  sum  of  the  J(nAt). 
Thus,  Eq.  (3.  33)  is  a  discrete  time  analog  for  d  J(t)/dt.  Note  the  similarity  between 
Eq  (3.33)  and  Eq.  (3.  18).  The  assumption  of  low  signal-to-noise  ratio,  A^  >>  0x(n) 
as  made  in  Ref.  15,  obviously  simplifies  the  problem. 
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4.  CRITERIA 


Since  information  has  been  defined  by  a  positive  definite  matrix,  a  maximum 
matrix  occurs  if  the  difference  between  the  maximum  matrix  and  any  other  attainable 
matrix  is  positive  definite.  Unfortunately,  such  a  maximum  is  rarely  possible  and  it 
is  usually  necessary  to  choose  a  scalar  criterion.  The  variety  of  conflicting  criteria 
often  leaves  the  designer  impaled  on  the  horns  of  a  dilemma  but  this  is  part  of  the 
price  one  must  pay  when  asking  for  a  mathematically  optimum  waveform.  The  concept 
of  noninferior  controls  (waveforms)  as  discussed  in  Ref.  18,  has  proven  valuable  in 
these  situations,  but  here  we  merely  illustrate  the  wide  range  of  possible  criterion  and 
discuss  how  they  are  handled. 

Consider  parameter  estimation.  In  general,  the  waveform  that  gives  the  best 
estimate  of  one  parameter,  say  range,  does  not  correspond  to  the  best  waveform  for 
estimating  a  different  parameter,  say  range  rate.  If  one  parameter  is  most  important, 
a  reasonable  criterion  is  to  maximize  the  information,  ty  on  this  one  parameter.  For 
linear  Gaussian  processes  such  as  the  radar  example  of  Sec  2, 


where  a  is  the  variance  of  the  estimate  of  the  parameter.  *  A  useful  extension  of 
this  idea  is  to  maximize  the  information  on  one  parameter  under  a  constraint  that  the 
information  on  another  is  equal  to  or  greater  than  some  specified  level.  Maximization 
of  information  on  some  weighted  combination  of  the  parameters  is  also  reasonable. 

The  determinant  and  trace  are  useful  measures  of  the  magnitude  of  a  positive  definite 
matrix  which  have  the  desirable  property  of  being  invariant  with  respect  to  linear 
transformations  of  the  parameters.  For  linear  problems,  the  trace  of  I  1  is  the 
expected  value  of  the  square  of  the  magnitude  of  the  error  vector  while  the  determinant 


*  Maximization  of  the  information  on,  say,  the  range  parameter,  makes  the  second 
partial  derivative  with  respect  to  range  of  the  magnitude  of  the  ambiguity  function  most 
negative  and  thereby  locally  "peaks"  the  ambiguity  function  in  the  range  dimension, 

(see  Ref.  4). 
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of  I  \  |j  *  |  =  l/|j|  is  a  measure  of  the  volume  of  the  error  ellipsoid.  In  radar 
problems,  the  motion  of  the  target  is  often  such  that  the  determinant  has  the  useful 
property  of  being  independent  of  the  time  at  which  the  state  of  the  target  is  estimated. 
Equation  (2.  3)  is  an  example  of  such  motion. 

A  "good”  information  matrix  for  the  communication  type  problem  is  one  with 
large  main  diagonal  elements  (Eq.  (3.  23)  large)  and  small  off-diagonal  elements 
(Eq  (3.25)  large).  If  the  time  duration,  T,  of  the  signal  is  fixed,  a  natural  choice  of 
criterion  is  to  require 


t^^(T)  ^  constant 


k  =  I,  ...,M 


(4.1) 


and  then  maximize  the  minimum 


tkk<T)"2tkj<T)  +  tJj(T) 


J.k-l . M  j  ^k.  (4.  2) 


Of  course,  other  criteria  abound.  For  example,  the  determinant  or  trace  or  a 
weighted  combination  of  the  i  (T)  and  t  (T)  raised  to  some  power  could  be  used. 

KK  JK 

However,  such  approaches  do  not  appear  to  have  a  natural  interpretation. 

In  many  problems  we  have  a  specified  performance  which  must  be  achieved.  For 
example,  the  parameter  estimate  variance  or  the  type  1  and  type  2  error  probabilities 
may  be  required  to  be  equal  to  or  below  certain  specified  levels.  We  might  then  want 
the  signal  of  minimum  time  duration,  T,  which  satisfies  the  specifications  .  The 
criterion  is  then  to  minimize  T.  Instead  of  minimizing  T,  we  might  want  to  minimize  the 
signal  energy  or  its  bandwidth  (see  the  next  section  ). 

Often,  only  a  portion  of  the  information  matrix  is  germane  to  the  choice  of 
criteria.  For  example,  in  Sec.  3  we  discussed  how  extra  state  variables  (nuisance 
variables)  are  introduced  to  handle  nonwhite  observation  noise.  In  such  cases  we 
partition  the  information  matrix 


J= 


11 


*21 


12 


I, 


22 


19 


corresponding  to  a  partitioning  of  the  state  vector  ,  ^  ,  into  sub  vectors  ^  and  ^ 
where  the  ^  are  nuisance  variables.  Let  I ^  denote  the  information  available  on 
just  Then  by  the  matrix  partitioning  theorems  or  by  the  theory  of  testing  a 

partitioned  hypothesis,  (see  Ref.  6), 


I  -I  l"1  I 
11  12  22  21 


I  is  the  information  matrix  to  be  used  in  our  criteria. 

All  of  the  above  criteria  can  be  formulated  in  state  variable  notation  .  Define 
a  state  variable,  L(t),  by  a  differential  equation  of  the  form, 


df(t) 

dt 


=  g(I(t).t) 


(4.3) 


It  is  desired  to  maximize  l (T)  under  various  constraints  on  J(T).  The  following 
examples  illustrate  the  techniques.  To  maximize,  |  J(T)|  we  might  use 


-f-(T)  =  log  \I(T)\ 

and 


dm 

dt 


=  tr 


I'\t) 


dJ(T) 

dt 


(4.4) 


For  the  criterion  of  Eqs.(4. 1)  and  (4.  2)  we  invoke  Eq.  (4.  1)  and  set 


ikk(T)  -  2ikj(T)  +  i  (T)  >  -t(T) 


and 


df(t) 

dt 


=  0  . 


To  minimize  the  time  duration  T,  we  set 

dm  =  . 

dt 

and  apply  the  desired  constraints  on  JT(T). 


(4.5) 


(4.6) 
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5.  SIGNAL  CONSTRAINTS 


All  signals  are  limited  in  time  duration,  bandwidth  occupied,  peak  amplitude 
and  total  energy.  However,  it  is  often  necessary  to  employ  only  the  critical  constraints 
in  the  mathematical  formulation.  For  example,  frequency  modulation  affects  only  the 
bandwidth  of  the  transmitted  signal.  A  total  energy  or  peak  amplitude  constraint  may 
result  in  an  infinite  bandwidth  signal  but  the  bandwidth  of  che  actual  system  may  be 
large  enough  to  validate  the  use  of  the  result. 

It  is  useful  to  introduce  a  function,  u(t),  called  the  control  function.  This 
function  specifies  the  waveform  through  either  of  the  two  relationships 


x(t)  =  u(t) 

(5  1) 

x(t)  =  Wj(t) 

(5  2) 

where  w^(t)  is  one  element  of  the  state  vector  w(t)  defined  by  the  vector  equation, 

w(t)  =  D x  w(t)  +D2u(t)  (5.3) 

Two  bas^c  forms  of  constraints  are  discussed,  a  strict  inequality  constraint  on  the 
control  fuiction, 


s  u(t)  s  e2 


(5.4) 


and  integral  constraints  of  the  form-' 


^sk(w(t),u(t),  t)dt  S 
0 


(5.5) 


Now  consider  how  Eqs.  (5  1),  thru  (5.5)  are  used  to  express  physical  constraints. 
A  total  energy  constraint  on  the  waveform  is  obtained  from 

*  e  is  used  as  a  generic  constant  which  is  rarely  related  from  one  equation  to  the  next. 
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x(t)  =  u(t) 


u^(t)dt  s  e 

0 

A  peak  amplitude  constraint  on  the  waveform  is  obtained  from 

x(t)  =  u(t) 

£  u(t)  ^  e2  . 

Let  X(u>)  be  the  Fourier  transform  of  x(t).  The  second  moment  of  X(u>)  is  a  measure 
of  bandwidth.  By  Parseval's  theorem 

I 

-00  0 

to  within  a  constant  of  proportionality.  Therefore,  a  second  moment  bandwidth  constraint 
on  the  waveform  is  obtained  from 

x(t)  =  w(t) 

w(t)  =  u(t) 

T 

C  2 

\  u  dt  ^  €  . 

0 

By  using  general  cases  of  Eq.  (5.  2)  and  (5.3)  and  integral  square  constraints  on  both 
u(t)  and  the  elements  of  w(t),  a  wide  range  of  bandwidth  constraints  of  this  type  can  be 
developed.  If  x(t)  is  a  frequency  modulation  with  a  large  modulation  index,  then  the 
significant  side  bands  of  the  transmitted  signal  are  contained  within  the  range  of  values 
spanned  by  x(t),  (see  Ref.  19).  Thus, 
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x(t)  =  u(t) 

€1  ~  u(t>  —  e2 


could  also  be  a  bandwidth  constraint. 

The  preceding  constraints  encompass  a  wide  variety  of  practical  problems. 
However,  they  do  not  include  problems  such  as  the  combination  of  peak  amplitude  and 
bandwidth  constraints  illustrated  by 

w(t)  =  x(t) 


dw(t)  .  v 

-dT  ’ 


f 

0 


u  dt  s  e 


^w(t)  ^  e2 


The  last  expression  is  called  a  phase  coordinate  constraint.  Such  constraints  can  also 
be  handled,  albeit  with  more  difficulty  (see  the  next  section). 

As  with  the  criteria  of  Sec  3,  we  want  the  constraints  expressed  in  terms  of 
state  variables  and  differential  equations.  Let  s_(  w(t),  u(t),  t)  be  the  vector  composed  of 
the  s^(  w(t),  u(t),t)  of  Eq  (5.  5)  Define  the  state  vector,  r(t)  by 

r(t)  =  s/w (t),  u(t),  t)  (5.6) 

The  constraints  of  Eq  (5  5)  then  become  constraints  on  r(T).  In  Sec.  4  we  saw  that 
constraints  on  the  state  variables,  J(t)  at  time  T  may  arise  through  the  choice  of  criteria. 
Here  we  see  that  similar  constraints  on  the  state  variables  r_(t)  at  the  T  result  from 
the  physical  restrictions  imposed  on  the  transmitted  signal. 

Another  way  to  handle  constraints  is  to  use  a  criteria  which  combines  J(T)  with 
some  property  of  the  signal;  for  example,  maximize 
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I  (T)  =  log |  J(T)|  -  y  u2(t)dt  . 

0 

However,  such  an  approach  does  not  appear  natural  in  most  problems  of  interest. 
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6. 


PONTRYAGIN’S  MAXIMUM  PRINCIPLE 


Pontryagin’s  Maximum  Principle  is  a  concise  statement  of  much  of  the  classical 
theory  of  the  calculus  of  variations.  However,  it  also  encompasses  a  wider  range  of 
problems  such  as  those  with  peak  amplitude  constraints.  The  Maximum  Principle  is 
discussed  with  neither  proof  nor  rigor  and  only  to  a  rather  limited  degree  of 
generality. 


Let  q(t)  denote  a  state  vector  with  elements  q^t),  j 


1,  .  .  .  ,p,  defined  by  the 


following  system  of  first-order  differential  equations. 


dq  (t) 

dJ-  =  gj(q(t).  u(t),t)  j  =  l . p  (6  1) 


or  in  complete  vector  notation 


dq(t) 

=  g(q(t),  u(t),  t)  (6.  2) 


when  the  g.  are  the  elements  of  £.  In  Section  3,4,  and  5  we  introduced  the  state 
variables  which  define  the  information  I( t);  the  criterion,  -£(t),  and  the  constraints, 
r(t)  and  w(t).  These  state  variables  are  related  to  the  waveform  x(t)  and  the  control 
function  u(t).  One  element  of  q(t)  is  to  be  associated  with  each  element  of  I(t),  ^t), 
r(t)  and  w(t)  The  differential  equation,  Eq.  (6.2),  is  thus  composed  of  equations  such 
as  Eqs.  (3.17),  (4.3),  (5. 3)  and  (5. 6).  The  A(t),  B(t),  and  C(t)  of  Eq.  (3.  17)  are  to  be 
expressed,  where  necessary,  in  terms  of  the  w(t)  and  control  u(t).  For  example,  Eq. 
(3.  18)  is  a  special  case  which  illustrates  the  dependence  on  the  signal  x(t)  which  is,  in 
turn,  to  be  defined  by  the  w(t)  and  u(t). 


Define  the  vector  £(t)  with  elements  p.(t),  j  =  1,  .  .  ,  p,  by  the  equations, 


H  =  p'(t)g(q(t),  u(t),  t) 


(6.3) 
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where  the  scalar  quantity  H  is  the  Hamiltonian*  and 


or  in  vector  notation 


d  ,  x 
¥pj(t) 


9H 

3qj(t) 


j  =  1, 


,p 


dP(t)  3H 

dt  9q(t) 


(6.4) 


where  the  right  hand  side  is  the  vector  whose  elements  are  the  partials  of  H  with 
respect  to  the  q^(t).  The  Pj(t)  are  called  here  the  adjoint  variables  although  the 
terminology  costate  variables  or  Lagrange  multipliers  is  also  employed  in  the 
literature.  Substituting  Eq.  (6.3)  into  Eq.  (6. 4)  gives, 


dp(t) 

“““  =  *  Gq(q(t),  u(t),  t)f)(t)  (6.5) 

where  G  is  the  pxp  matrix  whose  elements  are 

q 

%k(q(t),u(t),t) 

aq.(t) 

Equation  (6.  5)  is  the  adjoint  equation  for  the  linearized  version  of  Eq.  (6.  2)  and  hence 
the  name,  adjoint  variables,  for  g(t).  Equation  (6.  2)  can  be  rewritten  as 


d9ft) _  an 

dt  9g(t) 


Note  that  i"(t)  is  a  symmetric  matrix  while  q(t)  is  a  vector.  It  would  be 
possible  to  write  the  significant  elements  of  I( t)  in  a  column  vector  but  it  has  proven 


* 

P 


This  Hamiltonian  is  closely  related  to  the  Hamiltonian  of  classical  physics  where  the 

.s  are  the  momentum  variables. 

J 


26 


conceptually  valuable  to  maintain  J(t)  as  a  separate  entity;  namely,  a  matrix  Since 
this  is  rather  unconventional,  we  indicate  the  form  of  the  resulting  equations  and  how 
they  are  manipulated  when  J(t)  is  given  by  Eq.  (3.  17).  Define  q  ^(t)  as  the  vector 
containing  all  the  state  variables  except  those  iri  -T(t).  Then 

dq,(t) 

— dt-  =  g.1(a1(t).  -f(o,  u<t),  t)  . 

Let  (t)  be  the  adjoint  variables  associated  with  g^(t)  Since  J(t)  is  a  symmetric 
matrix/  we  represent  its  corresponding  adjoint  variables  by  the  symmetric  matrix, 
P^t).  The  Hamiltonian  of  Eq.  (6  3)  is  now  written  in  the  form* 


H  =  Hl  +  H2 


where  is  the  conventional  form  of  Eq.  (6.3) 


Hj  =  p '  (t)g.1(a1(t),i(t).u(t),t) 


while  is  by  inspection  of  the  form 


H 


2 


=  tr  P^t) 


dJ(t) 

dt 


where  is  Eq.  (3.  17)  written  as  a  function  of  q  (t),  J(t),  u(t)  and  t.  Stating 

Eq.  (6  4)  in  matrix  notation, 


__3H_ 

dt  aj(t) 

where  the  right  hand  side  denotes  the  matrix  of  partials  of  H  with  respect  to  the 
elements  of  I(t).  Then  by  performing  the  necessary  partial  differentiations,  it  can  be 


*  These  H’s  obviously  bear  no  relation  to  the  hypotheses  and 
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shown  without  too  much  difficulty  that 


dP j(t) 
dt  : 


[A(t)+B(t)S{  B'(t)2(t)]PJ<t)+PI<t)[A/(t)+I(t)B(t)S4  B'(t)] 


9H1 

9I(t) 


(6.6) 


The  last  term  of  Eq  (6.  6)  and  the  equations  for  — — -  depend  on  the  particular 

problem.  It  should  be  emphasized  that  there  is  no  new  concept  involved  in  this 
combination  of  vector  and  matrix  differential  equations.  It  is  simply  a  convenient 
notation  for  dealing  with  a  particular  system  of  p  equations. 

As  discussed  in  Sec.  5,  a  strict  inequality  constraint 

<  u(t)  2=  e2  (6.7) 

on  the  control  function  may  exist  Let  u°(t)  denote  the  control  function  which  maximizes 
q^(T)  subject  if  necessary  to  Eq  (6.7)  where  we  associate  the  criterion  state  variable, 
L(t),  with  q^t) 

Now  consider  Pontryagin's  Maximum  Principle.  The  Hamiltonian  of  Eq.  (6.3) 
is  a  function  of  t,p(t),q(t)  and  u(t);  that  is, 

H  =  H(p(t),q(t),  u(t),  t)  (6.8) 

The  Maximum  Principle  tells  us  that  u°(t)  must  minimize  H,  subject  if  necessary  to  the 
constraint  of  Eq  (6,  7).  This  means  that  u°,  expressed  as  a  function  of  g(t),q(t),  and 
t,  must  result  in  the  minimum  H  with  respect  to  variations  in  u  Section  7  will 
illustrate  this  minimization  when  a  constraint  such  as  Eq.  (6  7)  is  necessary  When 
Eq.  (6  7)  is  not  required,  the  necessary  conditions  on  u°(t)  are  obtained  from  3H/3u(t). 
This  is  illustrated  in  Sec.  8.  Section  8  actually  employs  a  vector  control  function,  u(t), 
but  this  is  a  direct  extension;  that  is,  u°(t)  must  absolutely  minimize  H. 

The  preceding  is  not  an  actual  statement  of  the  Maximum  Principle  but  rather  a 
weaker  corollary  drawn  from  it  This  corollary  does  not  always  provide  all  the 
desired  information  For  example,  g(t)  and  g(t)  maybe  such  that  H  is  independent 
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over  some  finite  period  of  time  of  the  value  of  u  or  its  sign  or  its  magnitude.  This  is 
called  the  singular  case  and  can  arise  in  certain  problems  of  waveform  design 
However,  for  many  problems  of  interest,  the  control  which  minimizes  the  Hamiltonian 
is  unique  and  can  be  completely  expressed  in  terms  of  J3(t),q(t)  and  t  Thus  u(t)  can 
be  eliminated  from  Eqs.  (6,  2)  and  (6  4)  to  give  a  system  of  2p  first-order  differential 
equations  in  the  2p  variables,  j>(t)  and  q(t)  Solution  for  the  resulting  p(t)  and  q(t) 
gives  u°(t)  and  thus  the  optimum  waveform 

Now  consider  the  important  question  of  boundary  conditions  for  Eq  (6.  2)  and 
Eq.  (6.4),  We  need  2p  boundary  conditions  to  specify  a  solution  q(t)  is  composed  of 
X(t),  w(t),  r(t)  and  £<t).  X(0)  is  the  a  priori  information,  while  w(0),  r(0),  and  -^0)  are 
part  of  the  model  Thus,  q(G)  provides  p  boundary  conditions.  The  other  p  conditions 
are  obtained  at  time  T  Some  elements  of  q(T)  are  usually  specified  or  functionally 
related  by  the  criterion  and  constraints.  However,  certain  of  the  elements  of  q(T)  are 
usually  unspecified.  In  the  radar  problem,  a  possible  criterion  is  the  minimization  of 
|  J(T)| ,  with  no  specification  on  the  elements  of  J(T)  Thus,  q(T)  may  furnish  some 
but  not  all  of  the  remaining  p  boundary  conditions  The  rest  are  obtained  from  the 
transversal ity  conditions  Although  these  transversality  conditions  can  become  complex, 
the  underlying  principle  is  simple.  Suppose  the  problem  specifies  the  values  of  p^ 
elements  of  q(T)  The  remaining  P  P^  elements  are  simply  chosen  such  that  the 
criterion  state  variable,  q1(T)(or  £{T)),  is  maximum  These  are  the  transversality 
conditions.  In  more  general  problems  where  p^  equations  or  inequalities  on  q(T)  are 
specified,  the  transversality  conditions  simply  become  more  complicated.  The  case  of 
a  free  T  is  obviously  a  special  case. 

Thus  for  many  problems,  the  Maximum  Principle  defines  the  maximum  infor¬ 
mation  signal  in  terms  of  the  solution  of  a  system  of  first-order  differential  equations 
with  split  boundary  conditions;  i.  e.  ,  a  two -point  boundary  value  problem  In  some 
cases  the  equations  can  be  integrated  in  closed  form  to  give  a  system  of  algebraic  or 
transcendental  equations.  If  g>(0)  were  known,  integration  on  a  computer  would  be 
simple.  Thus  iterative  search  techniques  can  be  employed  on  p(0)  to  determine  which 
provides  the  desired  relationship  among  the  elements  of  q(T)  "Insight”  may  enable 
changes  of  variables  which  simplify  the  integration  or  the  beautiful  Hamilton -Jacobi 
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theory  of  classical  physics  can  be  employed  This  leads  to  the  theory  of  Poisson  and 
Lagrange  brackets  and  canonical  transformations.  The  Hamilton -Jacobi  theory  is 
closely  related  to  Dynamic  Programming  which  is  still  another  way  to  approach  the 
basic  problem  Thus  we  are  faced  with  a  host  of  possible  methods  for  obtaining 
explicit  solutions.  Unfortunately  the  best  choice  depends  on  both  the  problem  and  the 
computing  facilities  available.  Each  new  situation  must  usually  be  treated  as  an  entity 
unto  itself. 

The  Maximum  Principle  can  be  extended  to  include  phase  coordinate  constraints 
such  as  discussed  in  Sec.  5.  However,  this  extension  constitutes  an  order  of  magnitude 
increase  in  difficulty  when  it  comes  to  actual  solutions  It  can  also  be  extended  in 
discrete  time  systems. 

The  above  discussions  are  obviously  not  complete  and  the  interested  reader 
must  resort  to  either  previous  knowledge  or  the  literature  if  he  wishes  to  actually  solve 
problems  Two  references  on  the  Maximum  Principle  employed  by  the  author  are 
Refs  20  and  21  Reference  22  contains  an  extensive  bibliographv  The  reader  who, 
like  the  author,  is  more  interested  in  understanding  the  use  rather  than  the  proof  of 
the  theory,  may  find  the  classical  calculus  of  variations  as  it  was  developed  for  physics 
very  valuable  Reference  23  is  a  good  general  reference  while  Ref  24  is  a  well- 
written  work  which  relates  optimal  control  theory  to  these  classical  approaches 
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7.  THE  RADAR  EXAMPLE 


Consider  the  radar  problem  first  stated  in  Sec.  2.  The  waveform  is  to  be 
constrained  with  respect  to  both  total  energy  and  peak  amplitude.  Thus,  following 
Sec.  5,  we  set 


X(t) 

=  u(t) 

(7.1) 

dr(t) 

dt 

=  u2(t) 

(7.2) 

0  s 

u(t)  S 

(peak  amplitude  constraint) 

(7.3) 

r(T) 

=  e2 

(energy  constraint)  . 

(7.4) 

Since  J(t)  depends  only  on  x  (t),  we  have  assumed  in  Eq.  (7.  3)  that  x(t)  is  always 
positive.  Substituting  Eq.  (7.  1)  into  Eq.  (3.  18)  gives 


where 


Mr  £(*)£' (0 


(7.5) 


For  the  sake  of  illustration,  assume  the  criterion  is  the  maximization  of  the  determinant 
of  J(T).  Thus,  following  Sec.  4 


l(T)  =  log  |  J(T)  |  (7.  6) 

and  using  Eq,  (7.  5)  and  Eq.  (4.  4) 

=  -*4 Mr  r'V)  £(t)£'(t)  .  (7.7) 

<J 
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In  addition  to  the  boundary  conditions  of  Eqs.  (7.  4)  and  (7.  6)  we  impose  the  conditions 


1(0)  -  0 
r(0)  =  0 
1(0)  =  0  . 

The  differential  equation  for  q(t),  Eq.  (6.2),  is  the  combination  of  Eqs.  (7.2), 
(7.  5)  and  (7.  7).  Let  I^(t)  be  the  adjoint  (matrix)  corresponding  to  X(t)  while  p^(t) 
and  p^(t)  correspond  to  r(t)  and  L(t)  respectively.  The  Hamiltonian  of  Eq.  (6.3)  then 
becomes  by  inspection, 

2  2 

H  =iLYLtr?p)£(t)£'(t)  p^(t)tr  +  pr(t)u2(t)  .  (7.8) 

a  a 

Now,  using  Eq.  (6,4), 


dp.f.(t) 

dt 


9H 

M(t) 


=  0 


and 


dpr(t) 

dt 


9H 

8r(t) 


=  0  . 


Thus  p  (t)  and  p  (t)  are  constants,  p  and  p  ,  respectively.  Repeated  use  of  the 
*  r  t  r 

matrix  equations  such  as 


dl 


-l"1  dll' 1 


tr  P  £  <£'  =  P  £ 


gives 


32 


dPr(t) 

dt 


8H 

9I(t) 


2 

2  W  i'W/'ft) 4(t) 

CT 


^  P^" 1  (t)  <£(t)  <f/  (t)  j" 1  (t) 


Thus, 


d  -r-l/v 

pi  dT1  <()' 


Pit)  +  p.I~  V)  =  D 


a  constant  matrix 


and  the  Hamiltonian  of  Eq,  (7.  8)  becomes 


H  =  u  (t)£(t) 


where 


S(t)  =  —  ^'(t)D^(t)  +  pr 


From  the  Maximum  Principle  we  know  that  the  optimum  control,  u°(t),  must 

absolutely  minimize  H  subject  to  the  constraint  of  Eq.  (7.3).  Thus, 

/ 


U°(t)  = 


£(t)  <  0 


0  f(t)  >  0  . 


Now  f  (t)  is  of  the  form 


S(t)  =  +  qt  +  f2r 


Thus  f  (t)  can  change  sign  at  most  twice  which  means  u°(t)  and  thus  the  optimum  waveform 
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x(t)  is  "on-off"  with  at  most  two  periods  of  full  transmission.  * 

The  actual  u°(t)  could  be  found  from  the  boundary  and  transversality  conditions 
and  a  solution  of  the  two-point  boundary  value  problem.  However,  it  is  deemed  easier 
to  investigate  all  "on-off"  signals  which  satisfy  the  constraints  and  which  have  at  most 
two  periods  of  transmission  to  see  which  provides  the  maximum  |j(T)|  .  This  can  be 
considered  a  change  of  variables.  The  results  of  this  investigation  will  be  reported  in  a 
later  report  which  also  considers  other  criteria  and  the  case  of  an  accelerating  target. 

The  actual  results  were  obtained  from  a  digital  computer  but  because  of  the  small  number 
and  limited  range  of  the  parameters  which  characterize  the  possible  optimum  waveforms, 
very  little  computation  time  is  involved.  These  results  also  show  that  the  choice  of  criteria 
usually  affects  only  the  switch  times,  not  the  form  of  the  optimum. 

Since  our  optimum  is  "on-off",  it  is  an  infinite  bandwidth  waveform  and  can 
never  be  exactly  implemented.  However,  if  the  rise  time  of  our  band-limited  transmitter 
is  small  compared  to  the  total  time  duration  T,  the  mathematically  optimum  waveform 
provides  a  good  basis  for  design  . 

It  is  important  to  remember  that  the  radar  model  of  Sec.  2  is  limited  in  scope 
and  was  chosen  primarily  to  illustrate  the  technique  of  optimum  waveform  design. 

Therefore  the  above  conclusions  on  the  shape  of  optimum  radar  waveforms  are  only 
examples  of  the  type  of  conclusions  possible  and  are  actually  applicable  to  only  special 
types  of  real  radar  problems. 


*  This  is  a  valid  conclusion  only  after  verification  that  £(t)  is  not  identically  zero  for 
all  time. 
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8.  THE  COMMUNICATION  EXAMPLE 

Consider  the  communication  problem  first  stated  in  Sec.  2.  The  differential 
equation  for  J(t)  is  obtained  trom  Eq.  (3.  27)  as 


dl(t) 

dt 


x(t)x'(t) 


(8.1) 


where  x(t)  is  the  vector  of  the  waveforms  x^(t).  Each  of  the  M  possible  waveforms 
are  to  be  constrained  with  respect  to  total  energy  and  bandwidth.  To  handle  the  energy 
constraints,  we  want  to  constrain 

T 

C  x^(i)dt  k  =  1,  .  .  .  ,  M  • 

0 

However,  as  discussed  in  Sec.  4,  the  main  diagonal  elements  of  J(T)  are  proportional 
to  the  total  waveform  energy  for  the  case  of  white  observation  noise.  Thus,  the  energy 
constraints  are  handled  by 

!kkm  =  Ej  •  (8.2) 


As  discussed  in  Sec.  5,  there  are  various  ways  to  handle  a  bandwidth  constraint 
and  the  choice  depends  on  the  particular  problem.  For  this  example,  we  constrain  the 
second  moment  of  the  energy  spectrum.  Thus  we  define* 


dx(t) 

dt 


=  u(t) 


(8.3) 


(8.4) 


2 

where  u(t)  is  the  vector  of  the  M  controls,  u  (t),  one  for  each  x  (t)  and  u  (t)  is 

2  k  k  - 

(unconventional  notation)  the  vector  of  the  u^(t).  The  bandwidth  constraint  is  then 

*  Section  5  employed  an  auxiliary  state  variable  w(t).  However,  here  it  is  simpler  to 
work  with  x^  directly  and  consider  x  a  state  vector. 
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rk(T)  ~  e2 


k  =  1 . M 


(8.5) 


where  the  r^(t)  are  the  elements  of  r(t). 

For  the  sake  of  illustration,  we  use  Eqs.  (4. 1),  (4.  5)  and  (4.6).  Equation  (4. 1) 

is  * 


i 


kk 


(T)  ^  e 


3  ' 


Rewriting  Eq.  (4.  5)  and  Eq.  (4.  6) 

tkk(T)  -2tjk(T)+  i..(T)  2=  L(T) 

dl(t)  = 
dt 


(8.6) 


(8.7) 


(8.8) 


In  addition  to  the  boundary  conditions  of  Eqs.  (8.  5),  (8.6)  and  (8.  7)  we  impose, 

r(0)  =  0 
1(0)  =  0 
x(0)  =  0 
x(T)  =  0  . 

The  differential  equation  for  q(t),  Eq.  (6.2),  is  the  combination  of  Eqs.  (8,1), 
(8.3),  (8.4)  and  (8.8).  Let  P  (t)  denote  the  adjoint  (matrix)  corresponding  to  J(t), 

£r(t)  the  adjoint  (vector)  corresponding  to  r(t),  d  (t)  the  adjoint  (vector)  corresponding 
to  x(t)  and  p^(t)  the  adjoint  (scalar)  corresponding  to  -f(t).  The  Hamiltonian  then 
becomes 


H  =  -|  tr  P_(t)x(t)x'(t)  +  j/(t)u(t)  +£'r(t)u2(t)  +  (0)p  (t). 

(7  —  — 


’Obviously  of  Eq.  (8.6)  cannot  be  greater  than  of  Eq.  (8.2). 
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Now 


dEr(t) 

dt 


8H 

9r(t) 


=  0 


dPJ(t) 

dt 


9H 

V(t) 


=  0 


which  means  £^(0  and  P,.(t)  are  constants,  £^  and  PT  respectively.  Similarly 


d%ft) 

3t 


9H 

9x(t) 


P^(t) 


(8.9) 


Let  P^  ^  denote  (unconventional  notation)  the  diagonal  matrix  formed  by  the 
reciprocals  of  the  elements  of  . 

Minimization  of  the  Hamiltonian  with  respect  to  u(t)  is  obtained  by  partial 
differentiation.  This  gives  for  the  optimum, 


(8.10) 


Substituting  Eq.  (8. 10)  into  Eq.  (8.3)  and  combining  with  Eq.  (8.  9)  gives 


x(t) 

0 

.Ip  H> 

2  r 

x(t) 

-2 

%(t> 

~2?I 

a 

0 

%(t) 

i<'>  =  -5Pr'1>jV>‘lT 

0 

dt  -  a  -  - 


(8.11) 


(8.12) 


(8. 13) 
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Equation  (8.  1 1)  or  Eqs.  (8. 12)  and  (8.  13)  specify  the  form  of  the  optimum 
waveforms.  ^  and  are  constant.  Thus  the  optimum  waveforms  are  generated  by 
a  system  of  2M,  first-order,  constant  coefficient,  linear,  time  invariant,  differential 
equations  (Eq.  (8.  11))  or  by  a  simple  system  of  M  second-order,  constant  coefficient, 
linear,  time  invariant  differential  equations  (8.  13)  followed  by  M  integrators  (Eq.  (8.  12)). 
Such  systems  and  their  corresponding  matched  filter  receivers  are  easy  to  build  and 
thus  the  optimum  results  in  a  system  which  can  be  readily  implemented. 

There  remains,  of  course,  the  determination  of  the  initial  conditions,  p^(0), 
and  the  values  of  P^-  and  .  These  quantities  are  specified  by  the  boundary  conditions 
and  transversality  conditions  and  of  course,  depend  on  Eqs. (8.  1)  and  (8.4).  Since  all 
the  necessary  equations  can  be  integrated  in  closed  form,  the  two-point  boundary  value 
problem  can  be  circumvented  if  desired.  However,  we  have  not  attempted  an  actual 
solution. 

As  with  the  radar  example,  remember  that  the  basic  model  which  leads  to  the 
preceding  conclusions  was  chosen  to  illustrate  the  theory,  not  for  realism. 
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9. 


DISCUSSION 


We  have  relied  primarily  on  simple  examples  to  indicate  the  breadth  of  the 
methodology  of  optimum  signal  design.  Therefore,  let  us  briefly  discuss  the  complete 
range  of  applications.  Obviously  only  samplings  of  possible  criterion  and  constraints 
were  given.  Discrete  time  problems  can  be  solved.  The  complete  properties  of  the 
received  signal  are  determined  by  its  probability  distribution  and  a  finite  number  of 
state  variables  (such  as  an  information  matrix)  can  completely  determine  this  distribu¬ 
tion  only  in  special  cases.  Thus  it  will  sometimes  be  necessary  to  directly  control  the 
probability  distributions  of  the  received  signal  and  this  can  lead  to  partial  differential 
equations  and  infinite  dimensional  state  systems.  The  necessary  optimization  theory 
is  still  in  its  infancy  but  work  is  progressing.  We  considered  only  the  design  of  deter¬ 
ministic  waveforms.  *  However,  the  basic  technique  of  combining  a  state  variable 
model  with  Pontryagin’s  Maximum  Principle  is  equally  applicable  to  the  design  of  a 
probability  distribution,  f(x).  The  independent  variable,  t,  in  x(t),  is  simply  replaced 
by  x,  in  f(x).  The  design  of  optimum  power  spectra  as  discussed  in  Ref.  1  can  also  be 
done  in  a  similar  fashion.  Note  that  the  maximum  principle  is  essential  for  such  prob¬ 
lems  as  the  classical  calculus  of  variations  does  not  allow  the  incorporation  of  the 
necessary  constraint  that  the  probability  density  (or  the  power  spectra)  never  be 
negative. 

Our  general  discussions  ended  with  the  optimum  waveform  specified  as  a  solu¬ 
tion  to  a  two -point  boundary  value  problem,  that  is,  the  optimum  waveform  was  para¬ 
meterized  by  a  finite  number  of  parameters  (the  boundary  conditions)  which  still 
remained  to  be  evaluated.  It  is  interesting  to  compare  this  result  with  the  design 
technique  wherein  the  range  of  possible  waveforms  is  initially  parameterized  and  then 
the  optimum  values  of  the  parameters  determined.  (For  example,  the  waveform  is 
restricted  to  be  a  linear  combination  of  some  chosen,  finite,  set  of  orthonormal  func¬ 
tions.  )  Our  parameterization  is  automatically  specified  by  the  inherent  dynamics  of 
the  information,  the  chosen  criterion  and  the  applied  constraints.  Since  the  resulting 
parameters  are  natural  to  the  problem,  solution  for  their  optimum  values  is  easier  as 

*  These  include  the  case  of  some  stochastic  signals  as  illustrated  by  the  laser  radar 
example  of  Sec.  3. 
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the  theory  discussed  at  the  end  of  Sec.  6  is  available,  in  addition,  this  parameteriza¬ 
tion  of  the  optimum  signal  has  its  own  intrinsic  value.  For  example,  we  learn  im¬ 
mediately  that  a  pulse  radar  is  better  than  a  CW  radar  and  that  the  optimum  communi¬ 
cation  waveforms  are  easy  to  generate.  * 

We  have  drawn  heavily  on  the  concepts  and  results  of  optimal  control  theory. 
Although  analogies  abound,  there  irs  one  important  and  felicitous  difference.  In  control 
problems,  the  optimum  control  is  usually  wanted  as  a  function  of  the  state  of  tile 
system;  that  is,  a  feedback  control  law.  In  signal  design  we  are  satisfied  with  the 
waveform  (control)  as  a  function  of  time.  Thus,  many  techniques  (such  as  iterative 
search  on  boundary  conditions)  which  are  inappropriate  for  most  realistic  control 
problems  can  be  used  for  waveform  design. 

The  basic  principles  of  optimum  waveform  design  are  very  simple.  All  work 
is  done  in  the  time  domain.  The  pertinent  properties  of  the  received  signal  are 
explicitly  displayed  by  the  state  variable  model  of  the  information.  Signal  constraints 
and  the  chosen  criteria  are  handled  in  a  natural  manner  and  automatically  incorporated  and 
displayed  through  the  appropriate  state  variables.  The  resulting  system  of  differential 
(difference)  equations  succumbs  easily  to  the  calculus  of  variations  (Pontryagin's 
Maximum  Principle)  which  provides  necessary  conditions  the  optimum  waveforms  must 
satisfy.  Actual  evaluation  of  the  optimum  waveform  may  not  be  easy  but  the  equations 
are  well  adapted  to  modem  computers  and  many  avenues  are  opea  Realistic  problems 
can  be  solved. 


*  Naturally,  these  statements  apply  only  in  the  context  of  the  limited  models  used  here! 
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APPENDIX 


The  definition  of  information  used  in  Sec.  3  is  not  the  definition  commonly 
employed  in  communication  theory.  Therefore,  we  shall  relate  the  two  concepts.  Much 
of  the  discussion  is  extracted  from  the  first  two  chapters  of  Ref.  6, often  nearly  verbatim. 

Assume  the  random  variable,  z,  is  observed  and  that  a  decision  between  the  two 
hypotheses, 


H 


H„ 


fj(2) 


f2(z) 


(A.  1) 


must  be  made.  It  is  well-known  in  communication  and  statistical  theory,  (see 

Refs.  6,  7,  17)  that  the  likelihood  ratio,  f  (z)/f^(z),  forms  the  basis  for  most  hypothesis 

tests  and  is  also  fundamental  to  the  theory  of  parameter  estimation.  For  a  particular 

observation,  z,  the  larger  f  (z)/f  (z)  the  more  likely  we  are  to  decide  in  favor  of  H  . 

i  z  l 

Thus,  log  f  (z)/f  (z)  measures,  in  some  sense,  the  information  gained  in  favor  of  H 
i  z  1 

over  To  show  why  log  f  (z)/{^(z)  is  an  especially  appropriate  measure,  assume 
the  existence  of  a  priori  probabilities,  P(H  ),  j  =  1,2,  on  the  two  hypotheses.  Let 
P(H^/z)  j  =  1,2,  denote  the  conditional  probability  of  given  the  sample  z.  By 
Bayes*  theorem, 


P(H  )  f  (z) 

r(Hj/z)  =  Pfl^f^z)  +P(H2)f2(z)  J=1,2 

from  which 

fj(z)  P(Hj/z)  P(H 

l0gr^"l0gp(H75"l08W  ' 

Thus,  log  f^(z)/f2(z)  is  the  difference  between  the  logarithm  of  the  "odds**  in  favor  of 
over  as  calculated  before  and  after  observing  z.  The  question  of  when  and 
where  a  priori  distributions  such  as  P(H.)  should  be  used  is  the  subject  of  many  debates 
but  fortunately  it  need  not  be  pursued  here,  log  f  (z)/f2(z)  is  well-defined,  whether  or 


41 


whether  not  the  P(lh)  exist.  If  the  P(H/  are  known,  their  effect  is  additive. 

The  mean  information,  I(  1: 2/ z)  as  defined  in  Sec.  3  is  simply  the  average  value 
of  log  f^zVf^z)  assuming  is  true. 

r  fi(z) 

I(l:2/z)  =  j  fL(z)  logy^-  dz  .  (A.  2) 


Nov/,  in  general 


1(1: 2/z)  i  1(2: 1/z)  . 

This  sometimes  makes  the  concept  of  divergence,  J(l:2/z),  a  useful  extension  where 

J(l:2/z)  =  1(1: 2/z)  +1(2: 1/z) 

r  Vz> 

=  )  (IjM-fjMMofyg-  dz. 

The  divergence,  J(l:2/z)  has  all  the  properties  of  a  distance  (or  metric)  except  for  the 
triangle  irregularity.  In  this  context,  1(1: 2/z)  can  be  considered  a  diverted  divergence. 
The  choice  between  using  J(  1 : 2/ z)  and  1(1: 2/z)  depends  on  the  problem.  * 

Now  consider  the  definition  of  information  commonly  employed  in  communication 
theory  (see  Sec.  3.  5  of  Ref.  25).  Assume  x  is  transmitted  with  probability  density 
g(x)  and  z  is  received  with  probability  density  h(z).  Let  f(x,z)  be  the  joint  distribution. 
The  quantity,  I,  given  by 


1  s  §  § ,(x,z>  l°sg$M*r dxdz  <A-3> 

is  called  the  mean  information  in  z  about  x,  or  the  mean  mutual  information  between 
x  and  z.  The  definition  of  Eq.  (A.  3)  can  be  related  to  the  rate  of  transmission  of 
information  and  the  channel  capacity,  (see  Ref.  25).  Section  2.  5  of  Ref.  6  also  discusses 


*  In  the  communication  and  radar  problems  of  Sec.  2,  1(1: 2/z)  =  1(2: 1/z)  and  thus  no 
decision  was  required. 
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these  concepts. 

The  mutual  information  of  Eq.  (A.  3)  is  a  special  case  of  Eq.  (A.  2)  in  the 
following  sense.  If 

11^:  x  and  z  dependent  with  distribution  f(x,  z) 

H^:  x  and  z  independent  with  distributions  g(x)  and  h(z). 


Then, 


1=  I(l:2/z). 

In  Sec.  3  we  developed  the  information  matrix  concept  using  the  definition  of  Eq.  (A.  2) 
and  in  Sec.  4  discussed  the  choice  of  a  scalar  function  of  this  matrix  which  is  to  be 
maximized.  Since  Eq.  (A.  3)  is  a  scalar  measure  of  information,  it  can  be  considered 
simply  another  criterion  which  could  have  been  included  in  Sec.  4. 

Equation  (A.  3)  could  be  employed  in  the  communication  example  of  Sec.  2  if  we 
assign  an  a  priori  probability  distribution  to  the  a  .  However,  such  an  approach 
obscures  the  basic  nature  of  the  problem  as  the  various  trade-offs  between  type  1  and  2 
errors  are  not  displayed.  Thus,  specializations  of  Eq.  (A  2)  other  than  Eq.  (A.  3) 
were  deemed  more  appropriate. 
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